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ABSTRACT 


Data-driven  validation  of  the  FOCUS  model’s  capacity  to  predict  the  dynamics  of 
social  identity  group  (SIG)  fonnation  would  allow  the  project  to  confirm  the  validity  of 
the  theoretical  mechanisms  encoded  in  the  model.  However,  such  efforts  are  currently 
inhibited  by  both  an  absence  of  high-resolution  geo-spatially  registered  SIG  data  that 
could  be  systematically  compared  to  the  model's  predictions.  A  spatial-temporal  map  of 
identity  fragmentation  in  social  media  discourse  would  provide  an  ideal  empirical  target 
for  FOCUS,  allowing  FOCUS  to  leap-frog  competing  projects  which  lack  empirically 
validated,  predictive  capabilities,  and  consequently  fail  to  satisfy  the  promise  of 
generating  believable  probability  distributions  over  the  potential  outcomes  of  operations 
intended  to  stabilize  a  region.  This  project,  as  a  proof  of  concept,  attempts  to  lay  the 
foundation  for  the  use  of  social  media  data  to  identify  social  variables  that  can  be  used  to 
model  acts  of  collective  violence  in  Nigeria  that  can  later  be  used  to  validate  the  FOCUS 
model. 
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SECTION  1.  INTRODUCTION 


“Validating  the  FOCUS  Model  through  an  Analysis  of  Identity  Fragmentation  in 
Nigerian  Social  Media”  is  a  project  that  was  designed  to  gain  valuable  spatial-temporal  data  from 
social  media  sources.  The  results  from  this  initial  analysis  is  intended  to  eventually  support  the 
validation  of  the  FOCUS  model’s  capability  to  predict  the  dynamics  of  social  identity  groups 
(SIG)  and  separately  predict  violent  conflict  in  a  region.  This  document  will  discuss  the  steps 
that  the  study  team  took  to  process  and  analyze  large  volumes  of  social  media  data  to  gain 
statistically  relevant  insights  into  SIG  and  violent  conflict. 

1.1.  BACKGROUND 

The  FOCUS  model  is  a  social  dynamic  model  designed  to  predict  changes  in  social 
identity  groups  over  time  and  space.  Data-driven  validation  of  the  FOCUS  model's  capacity  to 
predict  the  dynamics  of  social  identity  group  (SIG)  formation  would  allow  the  project  to  confirm 
the  validity  of  the  theoretical  mechanisms  encoded  in  the  model.  However,  such  efforts  are 
currently  inhibited  by  both  an  absence  of  high-resolution  geo-spatially  registered  SIG  data  that 
could  be  systematically  compared  to  the  model's  predictions.  A  spatio-temporal  map  of  identity 
fragmentation  in  social  media  discourse  would  provide  an  ideal  empirical  target  for  FOCUS, 
allowing  FOCUS  to  leap-frog  competing  projects  which  lack  empirically  validated,  predictive 
capabilities,  and  consequently  fail  to  satisfy  the  promise  of  generating  believable  probability 
distributions  over  the  potential  outcomes  of  operations  intended  to  stabilize  a  region.  The  Naval 
Postgraduate  School  and  TRAC-MTRY  researched  the  ability  to  generate  these  spatio-temporal 
maps  and  relevant  real  life  data  using  Twitter  data.  This  report  discusses  those  efforts  and  the 
initial  results  which  have  the  promise  to  support  the  validation  of  this  social  model. 

1.2.1.  Project  History 

In  April  of  2014  TRAC-MTRY  had  additional  projects  funds  available  for  research.  Dr. 

Camber  Warren  from  the  Defense  Analysis  department  approached  TRAC-MTRY  with  a  desire 

to  research  the  ability  to  use  social  media  data  to  analyze  social  identity  groups  in  different 

nations  and  the  capability  of  social  media  data  to  predict  violent  conflict.  TRAC-MTRY  and 

JWAC  decided  to  fund  this  project  by  funding  the  purchase  of  a  10%  random  sample  of  one 
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year’s  worth  of  worldwide  Twitter  data.  The  funds  were  transferred  to  NPS  and  Dr.  Warren 
purchased  the  data  through  GNIP  (Twitter  data  sales  company)  through  NPS  contracting. 

Though  this  project  was  projected  to  start  in  June  2014,  the  contracting  process  took  much  longer 
than  anticipated.  Twitter  bought  GNIP  towards  the  end  of  the  contracting  process,  which  added 
additional  months  of  contract  negotiation.  The  purchased  Twitter  data  was  finally  delivered  in 
January  2015  and  was  the  express  property  of  NPS.  Once  NPS  received  the  data  Dr.  Warren 
began  organizing,  processing  and  analyzing  the  data.  By  May,  Dr.  Warren  had  created  the 
Python  scrips  to  sort  through  the  data.  In  August  the  analysis  scripts  were  complete  and  Dr. 
Warren  was  able  to  generate  infonnative  heat  maps  of  Twitter  activity  in  both  Nigeria  and  Syria 
and  generated  an  academic  paper  that  explained  the  process,  methodology  and  results  of  his 
initial  analytic  efforts  using  Twitter  social  media.  Though  these  product  deliverables  marked  the 
end  of  this  project,  Dr.  Warren  is  continuing  to  build  on  his  initial  successes  and  there  is 
tremendous  potential  for  follow  on  projects  that  will  look  to  improve  on  the  analytic  methods 
used  to  gain  greater  understanding  on  the  social  dynamics  of  nations  using  social  media. 

1.2.  FOCUS  MODEL 

The  Flow  of  Communication  Upon  Society  (FOCUS)  model  is  an  agent  based  social 
stability  model  designed  by  Dr.  Steven  Hall  (NPS)  and  Dr.  Ryan  G.  Baird  (JWAC)  that  models 
the  interaction  of  different  population  groups  inside  a  network  who  are  competing  for  political 
agendas  and  resources.  It  uses  a  geographically  situated  agent  based  modeling  approach  that 
changes  over  time.  Agents,  individually  representing  people  and  collectively  representing  a 
governable  population,  dialogue  with  various  Factions,  which  are  competing  for  the  opportunity 
to  establish  governing  policy.  As  time  progresses  in  the  model,  loyalties  across  the  population 
change  and  realign.  FOCUS  provides  the  means  to  explore  the  sensitivity  of  these  emergent 
loyalties  to  the  various  modeled  influences  as  well  as  to  the  influence  of  the  geographical 
characteristics  of  the  region,  including  population  and  media  infrastructure  distribution  and 
density  (Hall  and  Baird  2013,  2).  Key  to  the  validation  of  this  model  is  the  ability  to  use  real 
world  data  as  inputs  into  the  model  at  time  0  and  then  compare  the  resulting  modeling  outputs  at 
time  n  to  the  corresponding  real  world  conditions  at  time  n.  Ultimately,  it  is  the  hope  of  this 
research  team  to  build  upon  the  methods  employed  in  this  research  project  to  provide  those  time- 

step  0  inputs  and  the  validating  time-step  n  real  world  conditions. 
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1.3.  PROBLEM  STATEMENT 


Can  social  media  data  be  used  to  empirically  validate  the  theoretical  mechanisms 
encoded  in  the  FOCUS  model,  by  developing  measures  of  identity  fragmentation? 

1.2.3.  Issues  for  Analysis. 

Issue  1:  Can  Social  Media  data  provide  relevant  insight  into  a  target  country’s  social 
dynamic  in  time  and  space? 

EEA  1.1.:  Can  social  media  data  identify  SIG? 

EEA  1.2.:  Can  social  media  data  identify  or  predict  violent  conflict? 

1.4.  CONSTRAINTS,  LIMITATIONS  AND  ASSUMPTIONS. 

Constraints  limit  the  study  team's  options  to  conduct  the  study.  Limitations  are  a  study 

team's  inabilities  to  investigate  issues  within  the  sponsor's  bounds.  Assumptions  are 
study-specific  statements  that  are  taken  as  true  in  the  absence  of  facts. 

•  Constraints: 

o  Complete  by  30  September  2015. 

o  Social  Media  data  is  limited  to  Twitter  data  from  August  1st,  2013  to  July  31st,  2014. 

•  Limitations: 

o  Study  is  limited  to  the  analysis  of  Nigeria  and  Syria  in  accordance  with  the  approved 
study  proposals. 

o  Usable  data  was  limited  to  geo-coded  tweets  which  represented  approximately  27% 
of  the  total  data  repository. 

o  Key  concepts  and  metrics  were  limited  to  social  identity  make-up,  national  identity, 
social  unrest  and  violent  conflict. 


•  Assumptions: 

o  Nigeria  and  Syria  provide  a  relevant  test  bed  for  developing  theoretical  metrics  that 
will  help  provide  insights  into  the  SIGs  and  social  unrest  of  all  nations. 

o  Geo-coded  tweets  provide  sufficient  representative  data  to  produce  relevant 


analysis  on  SIG  and  social  unrest. 
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SECTION  2. 


METHODOLOGY 


2.1.  OVERVIEW 

This  section  is  meant  to  be  a  summary  of  the  methodology  employed  in  this  project  to 
gain  insight  into  social  identity  groups  and  predict  collective  violence  using  social  media.  For 
greater  detail  into  the  processing  and  analysis  of  our  archived  twitter  database  refer  to  the 
attached  technical  paper  written  by  Dr.  Camber  Warren  entitled  “Mapping  the  Rhetoric  of 
Violence:  Political  Conflict  Discourse  and  the  Emergence  of  Identity  Radicalization  in  Nigerian 
Social  Media”,  which  is  located  in  Appendix  A. 


2.2.  “BIG  DATA” 

The  data  for  this  research  was  an  archived  database  of  Twitter  messages  contracted 
through  GNIP.  The  data  represented  a  10%  random  sample  of  all  public  messages  sent  through 
the  Twitter  network  between  1  August  2013  and  3 1  July  2014.  This  archive  constituted 
approximately  12  billion  messages  and  in  an  uncompressed  format  was  approximately  40 
Terabytes.  Although  tweets  are  limited  to  140  characters  of  content,  the  actual  twitter  file  is 
considerably  larger  due  to  embedded  metadata.  An  example  of  this  additional  metadata  is  user 
identification  information,  profile  infonnation  and  time  and  location  information.  As  a  part  of 
the  GNIP  contract  our  twitter  data  was  augmented  with  geo-location  information  in  the  form  of 
longitude  and  latitude  coordinates.  However,  roughly  only  27%  of  the  files  had  geo-location 
information.  The  implication  of  this  was  that  only  27%  of  the  data  was  useful  for  measuring 
spatial-temporal  subjects  from  the  corpus  of  information  that  we  possessed  (Warren  2015,  9). 
This  usable  dataset  was  further  diminished  when  we  began  analysis  of  specific  countries.  In  this 
project,  the  usable,  geo-located  dataset  for  Nigeria  accounted  for  approximately  14  million 
tweets  out  of  the  12  billion  tweets  in  our  archive. 
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2.3.  HARDWARE  CONFIGUATION 


The  sheer  size  of  our  archived  Twitter  database  created  tremendous  challenges  for 
storage  and  processing.  Without  sufficient  storage  and  processing  hardware  the  time  it  would 
take  to  process  the  40  Terabytes  information  could  take  months  of  continuous  run  time.  The  data 
storage  and  processing  tools  that  made  this  research  feasible  was  a  Central  Processing  Unit 
(CPU)  /  Graphic  Processing  Unit  (GPU)  hybrid  server,  designed  to  emphasize  parallel 
computation  and  in-memory  processing,  which  is  crucial  for  largescale  textual  and  geospatial 
analytics.  The  primary  processors  consisted  of  4  x  12-core  Intel  Xeon  E7-4860v2  CPUs  for  a 
total  of  48  processing  cores,  which  are  capable  of  parallel  processing.  Additionally,  there  were 
two  NVIDIA  Tesla  K40C  GPU  processors  that  equate  to  5,760  GPU  cores.  GPUs  have  the 
unique  ability  to  process  numbers  very  quickly  (millions  of  functions  per  second)  and  are  crucial 
in  high  speed  graphics  and  mathematical  manipulations.  The  computer  was  further  augmented 
with  64  x  32GB  DDR3L  server  memory  cards  that  provided  the  CPU/GPU  with  2  Terabytes  of 
Random  Access  Memory  (RAM).  This  was  perhaps  the  most  critical  component  built  into  our 
CPU/GPU  hybrid  because  it  provided  an  enormous  and  efficient  workbench  for  data  processing. 
Finally,  our  CPU/GPU  had  8  x  600GB  SSD  6Gb/s  SATA  hard  drives  that  equated  to  4.8 
terabytes  of  Read  Only  Memory  (ROM)  where  the  compressed  Twitter  data  was  archived.  The 
combination  of  this  hardware  setup  allowed  for  very  rapid  parallel  processing  that  took 
advantage  of  very  efficient  parallel  processors  that  could  conduct  all  data  manipulations  on  a 
RAM  workbench  that  accelerated  processing  speeds. 

It  is  worthy  to  note  that  initially  we  hoped  to  use  the  tremendous  computational 
capabilities  of  the  5,760  GPU  cores,  but  after  significant  research  we  discovered  that  GPUs  were 
limited  to  mathematical  number  manipulation  which  is  consistent  with  the  needs  of  high  speed 
computer  graphics,  but  incompatible  with  textual  analytics.  Utilizing  GPUs  to  process  textual 
data  is  currently  an  important  research  topic  in  industry,  but  no  actionable  solutions  are  available 
at  this  time.  The  result  of  this  discovery  was  that  we  were  limited  to  the  48  CPU  cores  for 
processing  data.  Though  this  was  less  than  what  our  team  hoped,  it  still  allowed  us  to  process 
approximately  500,000  files  per  second,  which  equated  to  approximately  seven  hours  of 
continuous  run  time  to  process  the  12  billion  files  of  Twitter  data. 
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2.4.  ANALYSIS  METHODOLOGY 


As  a  foundation  for  studying  Nigeria,  Dr.  Warren  examined  three  hypotheses  to  test  the 
usefulness  of  social  media  data  to  predict  regional  violence  (Warren  2015,  3): 

•  HI .  Spatio-temporal  regions  with  higher  levels  of  violent  political  rhetoric  will 
experience  higher  levels  of  violent  political  behavior. 

•  H2.  Spatio-temporal  regions  with  discourse  characterized  by  more  frequent 
reference  to  the  country  of  “Nigeria”  as  a  whole  will  experience  lower  frequencies 
of  collective  violence,  because  references  to  “Nigeria”  indicate  sentiment  of 
national  identity  and  belonging. 

•  H3.  Spatio-temporal  regions  with  discourse  characterized  by  more  frequent 
reference  to  the  “Hausa”  minority  identity  will  experience  higher  frequencies  of 
collective  violence.  This  is  based  off  of  historical  and  social  research  that  shows 
that  the  Housa  minority  group  has  been  the  primary  center-of-gravity  for  violence 
in  Nigeria. 

In  order  to  analyze  these  hypotheses  we  built  a  script  in  Python  that  would  open  each 
Twitter  file  and  first  see  if  it  had  a  geo-coded  location  that  was  located  in  Nigeria  and  was 
regionally  specific  enough  to  show  where  in  Nigeria  the  tweet  occurred.  These  tweets  were 
simultaneously  being  organized  into  1 -degree  x  1 -degree  x  1-hour  boxes  of  space-time  along 
with  the  tweets’  content,  stored  entirely  in  RAM.  These  files  were  organized  into  a  “key-value” 
store,  which  means  that  all  records  were  indexed  by  a  common  key  structure.  The  advantage  of 
this  setup  is  that  it  organizes  all  keys  into  a  'hash  table',  which  allows  for  very  fast  record  look-up 
speeds,  even  when  the  number  of  underlying  records  is  very  large  (Warren  2015,  10).  The  result 
was  14,322,348  separate  Twitter  messages  from  inside  Nigeria  that  were  organized  by  space- 
time.  This  in-RAM  dataset  became  the  basis  for  our  follow  on  analysis. 

Next,  three  categories  of  searchable  words  were  developed  to  answer  our  three 
hypotheses.  Using  the  cross-language  references  in  Wikipedia,  different  spelling  variants  of  the 
conceptual  category  “Nigeria”  (i.e.  'najeriya',  'naijiriya',  'naijiria')  were  identified  and  scripted 
into  a  hash  table.  This  strategy  was  repeated  for  conceptual  category  “Housa.”  Finally,  a  much 

more  complex  hash  table  was  built  for  the  concept  of  “armed  conflict,”  which  included  such  words  as 
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‘stabbing’,  airstrike’,  ‘soldier’,  etc.  In  total  a  list  of  366  English  language  terms,  representing  direct 
references  to  objects  and  actions  associated  with  anned  conflict,  was  developed  to  capture  the 
concept  of  “armed  conflict”  in  the  content  of  our  Nigerian  dataset  (Warren  2015,  33-39).  These 
terms  were  then  translated  into  the  five  most  common  non-English  languages  in  Nigeria  (i.e.  French, 
Arabic,  Hausa,  Igo,  and  Yorbua),  which  yielded  a  total  of  1,195  unique  search  strings. 

With  the  search  categories  developed,  each  Twitter  file  in  our  Nigerian  dataset  was  searched 
to  identify  matches  to  our  search  strings.  Then  we  estimated  a  continuous  spatial  surface, 
representing  the  relative  density  of  messages  referencing  each  concept  in  a  particular  place  and  time 
using  2-dimensional  binned  Gaussian  kernel  density  interpolation  (Warren  2015,  14).  Additionally, 
the  same  method  was  applied  to  the  total  Twitter  message  density  to  yield  an  estimated  continuous 
spatial  surface  for  the  total  Twitter  message  density.  The  final  values  developed  were  the  estimated 
concept  densities  divided  by  the  estimated  total  Twitter  message  densities  over  time  and  space. 

These  four  outputs  could  now  be  used  as  four  distinct  independent  variables  for  statistical  modeling. 
The  sampling  of  the  visual  representation  of  these  results  can  be  viewed  in  Figure  1 . 
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07-31-2014 


bgure  1.  Spatio-Temporal  Map  of  Nigeria  (From  Warren  2015,  18):  These  maps  show  the  smoothed 
densities  of  the  estimated  ‘total  message  density’,  the  concept  of  ‘Housa’,  and  the  concept  of  ‘Nigeria’  on 
three  different  days  at  the  beginning,  middle  and  end  of  our  Nigerian  dataset.  Darker  colors  of  red 
indicate  higher  densities  of  the  concept;  while  lighter  shades  are  lower  densities  (i.e.  white  is  the  most 
extreme  low  density).  The  green  circles  represent  the  actual  Twitter  message  locations  and  the  size  of 
those  circles  represents  comparative  volume  size. 


“Hausa”  “Nigeria” 

mm 


01-31-2014 


Total 


08-01-2013 


In  order  to  gain  insight  into  the  relevance  of  these  variables  to  the  modeling  of  violent 
conflict  the  team  needed  an  accurate  dataset  of  actual  violent  conflict  of  Nigeria  that  occurred 
during  the  span  of  our  dataset.  Using  the  Armed  Conflict  Location  and  Event  Data  Project 
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(ACLED)  v5  database  (Raleigh  2015),  which  contained  a  list  of  all  armed  conflict  in  Nigeria 
organized  by  date/time  and  latitude/longitude,  we  were  able  to  populate  a  dependent  variable  that 
we  could  then  use  to  build  simple  statistical  models  to  test  for  the  statistical  significance  of  our 
three  independent  variables  and  answer  our  hypotheses  (Warren  2015,  15). 
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SECTION  3. 


RESULTS 


3.1.  RESULTS  OF  ANALYSIS 

To  answer  our  three  hypotheses,  three  models  were  developed  using  a  heterogeneous 
point  process  model  with  a  Strauss  inter-point  interaction  function  designed  to  flexibly  capture 
patterns  of  spatial  autocorrelation  (Warren  2015,  15).  The  dependent  variable  for  all  three  was 
acts  of  recorded  violence  in  Nigeria.  Model  1  used  total  Twitter  message  density  as  the  sole 
independent  variable.  Model  2  included  the  four  independent  variables  of  ‘Total  Message 
Density’,  ‘armed  conflict’,  ‘Nigeria’,  and  ‘Housa’.  Model  3  included  all  of  the  before  mentioned 
independent  variables,  but  also  included  the  interaction  of  ‘Nigeria’  x  ‘Hausa’.  The  results  of 
these  models  showed  several  interesting  facts.  First,  every  independent  variable  was  statistically 
significant  with  p-values  less  than  0.05.  Second,  all  models  were  significant  in  their  ability  to 
model  whether  or  not  an  act  of  collective  violence  in  Nigeria  occurred.  Third,  based  off  of  the 
coefficients  of  each  independent  variable  of  Model  2,  our  hypotheses  1-3  were  correct.  Regions 
that  experienced  higher  levels  of  violent  rhetoric  were  more  likely  to  experience  collective 
violence;  regions  that  had  higher  densities  of  identity  to  the  national  notion  of  ‘Nigeria’ 
experiences  lower  levels  of  collective  violence,  and  regions  that  had  higher  reference  densities  to 
the  concept  of  ‘Housa’  experienced  higher  levels  of  collective  violence.  Refer  to  Figure  2  for  a 
more  in-depth  look  at  the  statistical  modeling  results. 

We  must  note  here  that  this  analysis  does  not  conclusively  prove  that  social  media  data  is 
‘reflective’  or  ‘constructive’  in  nature,  meaning  we  are  not  sure  if  social  media  discourse  is  just  a 
reflection  of  events  occurring  (i.e.  collective  violence)  or  if  it  actually  has  a  causation  effect  and 
leads  to  events  occurring. 
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Model  1 

Model  2 

Model  3 

Total  Density 

8.1990  *** 
(1.1181) 

13.5342  *** 
(2.7577) 

25.6532  *** 
(3.2742) 

"armed  conflict" 

3.2000  *** 
(0.3618) 

3.3299  *** 
(0.3698) 

"Nigeria" 

-0.8351  ** 
(0.3105) 

-4.4354  *** 
(0.5416) 

"Hausa" 

0.1518  *** 
(0.0246) 

-1.2323  *** 
(0.1806) 

"Nigeria"  x  "Hausa" 

1.7509  *** 
(0.2226) 

Intercept 

1.6191  *** 
(0.1284) 

-0.8584  * 
(0.3770) 

2.0897  *** 
(0.5674) 

Interpoint  Interaction 

0.0027  *** 
(0.0003) 

0.0024  *** 
(0.0004) 

0.0022  *** 
(0.0004) 

AIC 

1-3882.43 

-3917.65 

-3984.23 

Note:  Coefficients  from  heterogeneous  point  process  models.  Standard  error  in  parentheses. 
*p  <  0.05.  **p  <  0.01.  ***p  <  0.001 


Figure  2.  Heterogeneous  Point  Process  Modeling  Results  of  three  proposed  models. 

Comparative  ‘Goodness’  of  the  models  was  assessed  using  A1C.  Significance  of  the  independent 
variables  was  assessed  by  their  associated  p-values  indicated  by  the  asterisks. 

3.2.  DISCUSSION 

The  results  of  this  research  show  that  eventual  application  to  the  validation  of  the  FOCUS 
model  will  be  possible.  Validating  the  FOCUS  model  requires  the  ability  to  map  SIGs  and 
sentiment  concepts  across  space  and  time.  By  re-examining  our  primary  ‘Issue  for  Analysis’  we 
can  see  that  social  media  data  does  have  the  capability  to  inform  on  these  inputs. 
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Issue  for  Analysis  1:  Can  Social  Media  Data  provide  relevant  insight  into  a  country’s 
social  dynamic  in  time  and  space?  Yes.  In  our  proof-of-concept  we  were  able  to  show  that 
statistically  relevant  metrics  could  be  identified  over  time  and  space  that  could  partially  identify 
SIGs  and  sentiment  and  could  be  used  to  model  actual  violent  events  in  Nigeria. 

EEA  1.1:  Can  social  media  data  identify  SIGs?  Yes.  Our  metric  of  the  concept 
“Housa  ”  represented  one  of  the  most  disenfranchised  ethnic  groups  in  Nigeria.  The 
Housa  are  generally  located  in  the  north  of  the  country,  as  seen  in  Figure  3.  By 
examining  the  heat  maps  from  Figure  1,  we  see  that  generally,  the  high  density  regions 
for  the  search  concept  of  “Housa  ”  were  predominately  clustered  in  the  north  of  the 
country  as  well.  Although  this  is  not  definitive  proof  that  this  SIG  was  identified,  when 
coupled  with  the  fact  that  this  metric  was  statistically  significant  in  predicting  violent 
events  in  accordance  with  our  hypothesis,  we  can  make  the  case  that  social  media  data 
can  identify  SIGs. 

EEA  1.2:  Can  Social  media  data  identify  or  predict  violent  conflict?  Yes.  Our  statistical 
modeling  approach  showed  that  we  could  create  relevant  statistical  models  that 
identified  violent  conflict  by  applying  relevant  independent  variables  pulled  from  social 
media  data. 
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Major  ethnic  groups 

Hausa-Fulani  29% 
|  Yoruba  21% 

|  Igbo  18% 


(%  of  population) 

Kanuri  4% 

|  Ibibio  3.5% 

|  Tiv  2.5% 

Highly  Diverse  12% 
Areas 


MAPPING  NIGERIA’S  DIVERSITY 


CAMEROON 


NIGER 

HAUSA-FULANI 


BENIN 


Source:  Ulrich  Lamm.  Modified  by  author. 


Figure  3.  Ethnic  Map  of  Nigeria  (From  Kwaja  2011,  3).  This  map  shows  that  Flousa  ethnic  group  is 
primarily  located  in  the  northern  regions  of  Nigeria,  which  is  consistent  with  the  high  densities  of  the 
concept  “Housa”  seen  in  Figure  1. 
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SECTION  4. 


RECOMMENDATIONS 


This  research  only  represents  the  earliest  phases  of  research  designed  to  detennine  the 
feasibility  of  social  media  data  use  for  measuring  and  modeling  events  occurring  inside  national 
borders.  There  is  tremendous  room  for  expanded  research  using  the  principals  of  spatial- 
temporal  statistical  analysis  that  this  project  explores.  For  a  start  we  recommend  exploring  the 
scalability  of  applying  social  media  data  to  regions  of  interest.  Interesting  results  could  be 
gained  from  more  refined  analysis  of  cities  or  districts  within  a  country.  Additionally,  significant 
insights  could  be  gained  from  enlarging  the  region  of  interest  to  multi-country  regions  and 
continents.  Another  important  expansion  of  this  research  should  address  to  which  degree  social 
media  discourse  is  ‘reflective’  or  ‘constructive’  in  nature.  One  way  to  address  this  could  be  to 
model  collective  violence  using  social  media  variables  in  a  time-series  approach  to  see  if  social 
discourse  can  predict  collective  violence.  Lastly,  we  would  recommend  that  regional  subject 
matter  experts  be  used  to  create  more  refined  and  insightful  search  concepts  that  would  better 
identify  social  identity  groups  and  localized  expressions  that  more  effectively  capture 
expressions  of  collective  violence  or  socio-political  conflict. 
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APPENDIX  A.  “MAPPING  THE  RHETORIC  OF  VIOLENCE: 

POLITICAL  CONFLICT  DISCOURSE  AND  THE  EMERGENCE  OF 
IDENTITY  RADICALIZATION  IN  NIGERIAN  SOCIAL  MEDIA” 

The  attached  academic  paper,  written  by  Assistant  Professor  Camber  Warren,  is  the 
foundation  for  the  content  of  this  technical  memo.  It  contains  the  technical  solutions  to  the 
research  problem  that  this  project  addressed  and  the  methods  and  tools  that  were  used  to  answer 
the  elements  of  that  problem. 
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Mapping  the  Rhetoric  of  Violence: 

Political  Conflict  Discourse  and  the  Emergence  of  Identity 
Radicalization  in  Nigerian  Social  Media1 


T.  Camber  Warren 

Department  of  Defense  Analysis 
Naval  Postgraduate  School 

CamberW@gmail.com 


Abstract 

While  there  is  widespread  agreement  amongst  scholars  and  practitioners  that  processes  of 
popular  radicalization  frequently  underlie  the  generation  of  insurgent  violence,  an  absence  of 
high-resolution  data  has  prevented  existing  work  from  directly  validating  this  relationship.  To 
begin  to  fill  this  gap,  I  seek  to  leverage  new  social  media  technologies  to  our  advantage,  by  using 
them  as  a  means  of  data  collection.  More  specifically,  I  show  that  newly  developed  tools  for 
geo-coding  the  sending  locations  of  messages  sent  through  the  Twitter  network,  automated 
estimations  of  the  sentiments  expressed  in  those  messages,  and  spatial  interpolation  of  those 
estimates,  can  be  used  to  generate  dynamic,  data-driven  maps  of  national  attachments  and 
political  extremism  amongst  the  members  of  a  given  population.  This  approach  is  applied  to  the 
analysis  of  identity  radicalization  and  fragmentation  in  Nigeria,  over  the  period  August  2013  to 
July  2014.  The  results  demonstrate  that  network-analytic  metrics  derived  from  spatio-temporal 
variation  in  social  media  content  hold  substantial  promise  for  enhancing  our  understanding  of  the 
conditions  which  most  favor  the  emergence  of  political  extremism  and  collective  violence. 


1  Prepared  for  presentation  at  the  Annual  Meeting  of  the  American  Political  Science  Association,  Sept.  3rd-6th,  2015, 
San  Francisco,  CA. 
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Introduction 


A  burgeoning  body  of  literature  increasingly  points  to  the  importance  of  communication 
dynamics  in  the  generation  of  armed  conflict  and  collective  violence  (Pierskalla  and  Hollenbach 
2013;  Shapiro  and  Weidmann  2015;  Warren  2014,  2015;  Weidmann  2015),  and  in  particular  the 
role  played  by  polarization  along  newly  politicized  ethnic  cleavages  (Bhavnani  and  Miodownik 
2009;  Buhaug,  Cederman,  and  Rod  2008;  Cederman,  Weidmann,  and  Gleditsch  2011; 
Cederman,  Wiinmer,  and  Min  2010).  However,  an  absence  of  suitable  data  has  prevented 
existing  work  from  directly  validating  the  relationship  between  patterns  of  political 
communication  and  patterns  of  political  violence. 

To  begin  to  fill  this  gap,  I  seek  to  leverage  new  social  media  technologies  to  our 
advantage,  by  using  them  as  a  means  of  data  collection.  More  specifically,  I  show  that  newly 
developed  tools  for  geo-coding  the  sending  locations  of  messages  sent  through  the  Twitter 
network,  automated  estimations  of  the  sentiments  expressed  in  those  messages,  and  spatial 
interpolation  of  those  estimates,  can  be  used  to  generate  dynamic,  data-driven  maps  of  national 
attachments  and  political  extremism  amongst  the  members  of  a  given  population. 

As  an  initial  plausibility  probe,  this  approach  is  applied  to  the  analysis  of  identity 
radicalization  and  fragmentation  in  Nigeria,  over  the  period  August  2013  to  July  2014.  In 
particular,  I  hypothesize  that  spatio-temporal  variation  in  discursive  references  to  particular 
conceptual  categories  will  be  systematically  related  to  the  generation  of  events  of  collective 
violence.  Extending  the  argument  presented  in  Warren  (2014)  and  Warren  (2015),  I  claim  that 
this  linkage  represents  a  fundamental  mechanism  in  the  production  of  collective  violence.  In 
brief,  large-scale  violence  requires  the  successful  production  and  dissemination  of  political  ideas 
justifying  that  violence.  As  a  result,  violence  must  be  spoken  into  existence,  before  it  can  be 
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enacted.  This  implies  that  it  may  be  possible  to  observe  increases  in  the  production  of  violent 
rhetoric  prior  to  the  emergence  of  violent  acts,  and  perhaps  even  to  use  such  measurements  to 
predict  the  occurrence  of  collective  violence  before  it  erupts  in  actuality.  Moreover,  this 
perspective  implies  that  variation  in  the  basic  conceptual  categories  of  political  communication 
could  exercise  profound  effects  on  the  likelihood  of  large-scale  conflict.  In  regions  where 
political  discourse  tends  to  deploy  the  unifying  categories  of  “nation”  and  “country”,  it  may  be 
more  difficult  to  generate  the  kinds  of  political  ideation  which  justify  violence  against  one’s 
fellow  citizens.  In  contrast,  in  regions  where  the  dominant  discourse  revolves  instead  around 
narrow  sectarian  identities,  it  may  be  easier  for  political  actors  to  generate  the  kinds  of 
animosities  that  feed  spirals  of  polarized  violence.  Nigeria  provides  a  particularly  interesting 
window  on  such  dynamics,  as  the  north  of  the  country  has  recently  been  characterized  by 
increasingly  vociferous  mobilization  of  the  “Hausa”  ethnic  minority,  by  political  actors  seeking 
greater  regional  autonomy.  I  will  thus  examine  the  following  hypotheses: 

HI .  Spatio-temporal  regions  with  higher  levels  of  violent  political  rhetoric  will 
experience  higher  levels  of  violent  political  behavior. 

H2.  Spatio-temporal  regions  with  discourse  characterized  by  more  frequent  reference 
to  the  country  of  “Nigeria”  as  a  whole  will  experience  lower  frequencies  of 
collective  violence. 
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H3.  Spatio-temporal  regions  with  discourse  characterized  by  more  frequent  reference 
to  the  “Hausa”  minority  identity  will  experience  higher  frequencies  of  collective 
violence. 

The  Predictive  Power  of  Social  Media 

With  the  surging  global  popularity  of  social  media  platfonns,  researchers  from  a  variety 
of  disciplines  have  begun  seeking  analytic  approaches  which  might  allow  predictive  insights  to 
be  derived  from  social  media  streams  in  an  unsupervised  fashion.  While  some  have  focused  on 
the  aggregate  dynamics  of  popular  culture  (Agarwal,  Xie,  Vovsha,  Rambow,  et  al.  2011;  Asur 
and  Huberman  2010;  Bae  and  Lee  2012;  Barbosa  and  Feng  2010;  Benhardus  and  Kalita  2013; 
Bessi,  Caldarelli,  Vicario,  Scala,  et  al.  2014;  Cataldi,  Caro,  and  Schifanella  2010;  Golder  and 
Macy  2011;  Hansen,  Arvidsson,  Nielsen,  Colleoni,  et  al.  2011;  Jansen,  Zhang,  Sobel,  and 
Chowdury  2009;  Java,  Song,  Finin,  and  Tseng  2007;  Kim,  Bak,  and  Oh.  2012;  Lerman  and 
Ghosh  2010;  Lennan  and  Hogg  2010;  Leskovec,  Adamic,  and  Hubennan  2007;  Lin,  Keegan, 
Margolin,  and  Lazer  2014;  Morris,  Counts,  Roseway,  Hoff,  et  al.  2012;  Naaman,  Boase,  and  Lai 
2010;  Naveed,  Gottron,  Kunegis,  and  Alhadi  2011;  Suh,  Hong,  Pirolli,  and  Chi  2010;  Wu  and 
Hubennan  2007;  Wu,  Hofiman,  Mason,  and  Watts  2011),  others  have  attempted  to  use  metrics 
derived  from  individual  messages  to  develop  algorithms  that  Team’  the  underlying  sentiments  of 
individual  communicators  (Abbasi,  Chen,  and  Salem  2008;  Agarwal,  Xie,  Vovsha,  Rambow,  and 
Passonneau  2011;  Bae  and  Lee  2012;  Barbosa  and  Feng  2010;  Bifet  and  Frank  2010;  Bollen, 
Pepe,  and  Mao  2011;  Dodds,  Harris,  Kloumann,  Bliss,  et  al.  2011;  Fan,  Zhao,  Chen,  and  Xu. 
2014;  Ghiassi,  Skinner,  and  Zimbra  2013;  Golder  and  Macy  2011;  Huang,  Peng,  Li,  and  Lee 
2013;  Jiang,  Yu,  Zhou,  Liu,  et  al.  2011;  Mitchell,  Frank,  Harris,  Dodds,  et  al.  2013;  O’Connor, 
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Balasubramanyan,  Routledge,  and  Smith  2010;  Pak  and  Paroubek  2010;  Stieglitz  and  Dang- 
Xuan  2012;  Thelwall,  Buckley,  and  Paltoglou  2011;  Wang,  Can,  Kazemzadeh,  Bar,  et  al.  2012). 
However,  both  approaches  have  face  serious  difficulties  in  the  pursuit  of  systematic  empirical 
validation.  In  particular,  the  lack  of  any  systematic  cross-linguistic  and  cross-cultural  ‘ground- 
truth’  against  which  to  compare  automated  sentiment  classifications,  has  generally  forced  such 
researchers  to  limit  themselves  to  single-language  (usually  English)  texts  drawn  from  limited 
domains  (e.g.  news  reports,  movie  reviews,  etc.). 

In  contrast,  a  more  recent  wave  of  scholarship  has  sought  to  develop  metrics  geared 
towards  the  generation  of  explicit  predictions,  which  can  be  compared  more  directly  to  observed 
events.  In  particular,  researchers  have  shown  that  mood-based  signals  drawn  from  aggregate 
streams  of  Twitter  messages  are  partially  predictive  of  swings  in  financial  markets  (Bollen,  Mao, 
and  Zeng  2011;  Zhang,  Fuehres,  and  Gloor  2011,  2012).  Along  similar  lines,  a  number  of 
researchers  have  found  that  political  election  results  can  be  predicted  with  some  accuracy 
through  relatively  simple  counts  of  references  to  the  opposing  candidates  (Adamic  and  Glance 
2005;  Bermingham  and  Smeaton  2011;  Franch  2013;  Gayo-Avello  2013;  Fassen  and  Brown 
2011;  Metaxas  and  Mustafaraj  2012;  Tumasjan,  Sprenger,  Sandner,  and  Welpe  2010;  Wang, 

Can,  Kazemzadeh,  Bar,  and  Narayanan  2012).  While  such  work  has  generated  more  convincing 
evidence  that  useful  information  can  be  derived  from  social  media  streams  in  an  automated 
fashion,  such  ‘predictions’  have  generally  been  limited  to  relatively  simple  outcomes,  and  have 
been  somewhat  limited  in  their  ability  to  shed  light  on  the  actual  mechanisms  underlying  the 
events  of  interest. 

Taking  a  different  angle  on  social  media  research,  other  researchers  have  sought  to  use 
these  new  communication  media  as  sources  of  data  on  the  behavior  of  underlying  human 
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populations.  Seen  from  this  perspective,  social  media  represent  a  new  kind  of  human 

“macroscope”,  allowing  researchers  to  measure  quantities  that  would  have  previously  remained 

opaque  to  observation,  at  a  scale  and  resolution  that  would  have  previously  been  impossible  to 

achieve.  In  this  way,  social  media  can  serve  as  a  new  tool  for  developing  enhanced 

understanding  of  the  fundamental  mechanisms  underlying  human  social  and  political 

interactions.  For  instance,  a  number  of  works  have  begun  investigating  how  cultural  products 

achieve  popularity,  examining  both  the  content-level  and  context-level  factors  that  lead  messages 

to  be  repeated,  and  developing  new  models  of  the  dynamics  of  infonnation  diffusion  (Aral  and 

Walker  2012;  Bakshy,  Hofiman,  Mason,  and  Watts  2011;  Bliss,  Kloumann,  Harris,  Danforth,  et 

al.  2012;  Boyd,  Golder,  and  Lotan  2010;  Cha,  Haddadi,  Benevenuto,  and  Gummadi  2010; 

Dodds,  Harris,  Kloumann,  Bliss,  and  Danforth  2011;  Eisenstein,  O’Connor,  Smith,  and  Xing 

2014;  Golder  and  Yardi  2010;  Golub  and  Jackson  2010;  Gomez,  Manuel,  and  Krause  2010; 

Hansen,  Arvidsson,  Nielsen,  Colleoni,  and  Etter  2011;  Kwak,  Lee,  Park,  and  Moon  2010; 

Plilzner,  Garas,  and  Schweitzer  2012;  Romero,  Meeder,  and  Kleinberg  2011;  Shainma, 

Kennedy,  and  Churchill  2011;  Stieglitz  and  Dang-Xuan  2012;  Zaman,  Herbrich,  Gael,  and  Stem 

2010).  In  a  similar  vein,  researchers  have  begun  to  examine  the  forces  underlying  the  generation 

of ‘collective  attention’,  combining  empirical  measures  with  simulation  models  of  competition 

between  ‘memes’,  to  examine  the  operation  of  ecological  constraints  on  message  reproduction 

(Benhardus  and  Kalita  2013;  Cataldi,  Caro,  and  Schifanella  2010;  Hong  and  Davison  2010; 

Jungherr  and  Jurgens  2013;  Lehmann,  Goncalvcs,  Ramasco,  and  Cattuto  2012;  Mehrotra, 

Sanner,  Buntine,  and  Xie  2013;  Mei,  Liu,  Su,  and  Zhai  2006;  Sasahara,  Hirata,  Toyoda, 

Kitsuregawa,  et  al.  2013;  Weng,  Flammini,  Vespignani,  and  Menczer  2012;  Wu  and  Hubennan 

2007),  while  others  have  used  data  from  social  media  streams  to  build  models  of  the  mechanisms 
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underlying  the  fonnation  and  dissolution  of  social  ties  between  individuals  (Bollen,  Gonsalves, 
Ruan,  and  Mao  2011;  Bond  et  al.  2012;  Coviello  et  al.  2014;  Fan,  Zhao,  Chen,  and  Xu.  2014; 
Frank,  Mitchell,  Dodds,  and  Danforth  2012;  Golder  and  Yardi  2010;  Gonzalez,  Cuevas,  Cuevas, 
and  Guerrero  2011;  Himelboim,  McCreery,  and  Smith  2013;  Kuehn,  Martens,  and  Romero  2014; 
Lazer  et  al.  2009;  Mitchell,  Frank,  Harris,  Dodds,  and  Danforth  2013;  Mutz  2002;  Shalizi  and 
Thomas  2011;  Zamal,  Faiyaz,  and  Ruths  2012) 

Increasingly,  such  efforts  are  also  being  applied  to  the  political  domain,  yielding 
substantial  new  insights  into  the  dynamics  of  public  opinion,  electoral  competition,  and  political 
persuasion  (Adamic  and  Glance  2005;  Ausserhofer  and  Maireder  2013;  Barbera  and  Rivero 
2014;  Barbera  2014,  2015;  Barbera,  Jost,  Nagler,  Tucker,  et  al.  2015;  Bermingham  and  Smeaton 
2011;  Bond  and  Messing  2015;  Chadwick  2006,  2013;  Conover  et  al.  2011;  Conover,  Gonsalves, 
Flammini,  and  Menczer  2012;  Conover,  Goncalvcs,  Ratkiewicz,  Flammini,  et  al.  2011; 

DiGrazia,  McKelvey,  Bollen,  and  Rojas  2013;  Farrell  2012;  Feller,  Kuhnert,  Sprenger,  and 
Welpe  2011;  Golbeck  and  Hansen  2014;  Grossman,  Humphreys,  and  Sacramone-Lutz  2014; 
Himelboim,  McCreery,  and  Smith  2013;  Lawrence,  Sides,  and  Farrell  2010;  Monroe,  Colaresi, 
and  Quinn  2008;  Mustafaraj,  Finn,  Whitlock,  and  Metaxas  201 1,  201 1;  Pannelee  and  Bichard 
2012;  Prior  2007;  Ringsquandl  and  Petkovic  2013;  Shirky  2011;  Stieglitz  and  Dang-Xuan  2012; 
Wojcieszak  and  Mutz  2009;  Yardi  and  Boyd  2010).  In  addition  to  the  study  of  ‘normal’  politics, 
researchers  are  also  increasingly  using  metrics  derived  from  social  media  to  shed  new  light  on 
the  dynamics  of  social  mobilization,  political  polarization,  and  collective  violence  (Aday  et  al. 
2010;  Bailard  2015;  Brandt,  Freeman,  and  Schrodt  2011,  2014;  Colbaugh  and  Glass  2012; 
Conover  et  al.  2013;  Gleason  2013;  Gohdes  2015;  Hammond  and  Weidmann  2014;  Howard  and 
Hussain  2013,  201 1;  Hussain  and  Howard  2013;  Lotan,  Graeff,  Ananny,  Gaffney,  et  al.  2011; 
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Martin-Shields  and  Stones  2014;  Mettemich,  Dorff,  Gallop,  Weschle,  et  al.  2013;  Metzger  et  al. 
2014;  Munger  2014;  Pierskalla  and  Hollenbach  2013;  Ramakrishnan  et  al.  2014;  Ritter  and 
Trechsel  2014;  Schroeder,  Everton,  and  Shepherd  2014;  Shapiro  and  Weidmann  2015;  Siegel 
2014;  Theocharis  2013;  Tudoroiu  2014;  Tufekci  and  Wilson  2012;  Wang,  Gerber,  and  Brown 
2012;  Ward  et  al.  2013;  Warren  2015;  Windt  and  Humphreys  2014;  Wolfsfeld,  Segev,  and 
Sheafer  2013;  Zeitzoff,  Kelly,  and  Lotan  2015;  Zeitzoff  2013).  Moreover,  while  such  research 
has  generally  found  that  such  technologies  decrease  stability  in  weak-state  environments,  other 
researchers  have  pointed  to  the  ability  of  authoritarian  governments  to  also  turn  such  tools  to 
their  advantage  (Gohdes  2015;  Howard,  Agarwal,  and  Hussain  2011;  Kalathil  and  Boas  2003; 
King,  Pan,  and  Roberts  2013;  Lynch  2011;  Morozov  2011;  Munger  2014;  Rod  and  Weidmann 
2015). 

A  Spatio-Temporal  Approach 

In  most  of  the  analyses  reported  above,  metrics  were  calculated  based  on  units  of  analysis 
characterized  by  individual  users,  or  individual  messages.  The  difficulty  with  such  approaches, 
when  attempting  to  make  statistical  judgements  concerning  the  underlying  population,  is  that  the 
sample  is  likely  to  be  strongly  biased  along  a  number  of  dimensions.  It  is  well  known  that  use  of 
social  media  correlates  with  a  number  of  demographic  characteristics,  including  age  and  wealth, 
and  that  social  media  users  are  therefore  unlikely  to  provide  a  fully  representative  sample  of  the 
underlying  population  (Ansolabehere  and  Hersh  2012;  Barbera  and  Rivero  2014;  Mislove, 
Lehmann,  Ahn,  Onnela,  et  al.  2011).  As  a  result,  metrics  for  which  “users”  are  in  the 
denominator  (i.e.  positive  messages  per  user  per  day)  are  likely  to  be  similarly  biased. 
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The  approach  adopted  here  is  instead  to  characterize  the  relevant  metrics  as  functions  of 
space-time  units,  rather  than  as  proportions  of  users.  Here,  I  take  inspiration  from  recent  work 
which  has  shown  improvements  in  our  abilities  to  make  automatic  judgements  of  geographic 
location  from  unstructured  text  in  Twitter  user  profdes  (Blanford,  Huang,  Savelyev,  and 
MacEachren  2015;  Cheng,  Caverlee,  and  Lee  2010;  Compton,  Jurgens,  and  Allen  2014;  Conover 
et  al.  2013;  Hawelka  et  al.  2014;  Kaltenbrunner  et  al.  2012;  Kulshrestha,  Kooti,  Nikravesh,  and 
Kp.  2012;  Lee  and  Sumiya  2010;  Leetaru,  Wang,  Cao,  Padmanabhan,  et  al.  2013;  Mitchell, 
Frank,  Harris,  Dodds,  and  Danforth  2013;  Nemeth,  Mauslein,  and  Stapley  2014;  Takhteyev, 
Gruzd,  and  Wellman  2012;  Yuan,  Cong,  Ma,  Sun,  et  al.  2013).  This  approach  allows  researchers 
to  greatly  expand  the  sample  of  Twitter  messages  which  can  be  geo-referenced  (from  around  2% 
to  27%),  by  avoiding  the  need  for  GPS  coordinates,  and  instead  relying  on  the  user-reported 
hometowns  from  their  public  profiles. 

The  starting  point  for  this  analysis  is  an  archived  database  of  Twitter  messages, 
representing  a  fully  randomized  10%  sample  of  all  public  messages  sent  through  the  Twitter 
network  between  August  1st,  2013  and  July  31st,  2014;  approximately  12  billion  messages  in 
total.2  In  uncompressed  format,  this  archive  represents  approximately  40  Terabytes  of  textual 
data,  and  so  the  very  scale  which  offers  this  new  “macroscope”  also  represents  a  challenge  for 
standard  computational  approaches,  which  search  across  strings  in  serial  order.  The  solution 
adopted  here  is  to  script  the  production  of  “in-memory”  database  indexes,  organized  to  reflect 
bins  of  space,  time,  and  other  nested  concepts.  In  particular,  I  utilize  what  is  known  as  a  “key- 
value”  store,  which  means  that  all  records  are  indexed  by  a  common  key  structure,  which  is  just 

2  Archive  licensed  through  agreement  between  Twitter,  Inc.  and  the  U.S.  Naval  Postgraduate  School,  as  part  of  the 
“Global  Data  Initiative.”  See  www.camberwarren.net/gdi. 
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a  string  describing  membership  in  some  set  of  containers  in  which  many  individual  records  are 
stored.  The  database  is  a  modified  version  of  the  open-source  Aerospike  database,3  which  I  have 
expanded  to  allow  for  highly-parallelized  loading  of  data  into  RAM,  by  creating  separately 
threaded  communication  channels  for  each  logical  CPU  core  in  the  system,  allowing  ‘swarms’  of 
parallel  computational  workers  to  operate  in  tandem,  and  avoid  resource  conflicts,  without  the 
need  for  hierarchical  control  structures.  The  advantage  of  this  setup  is  that  it  organizes  all  keys 
into  a  'hash  table',  which  allows  for  very  fast  record  look-up  speeds,  even  when  the  number  of 
underlying  records  is  very  large. 

Our  first  task  is  to  use  this  memory  structure  to  reference  each  message  to  a  location  in 
space,  given  by  latitude  and  longitude  coordinates.  To  do  so,  I  draw  on  data  from  the 
geonames.org  gazetteer,  an  open-source  database  of  named  geographic  places.  The  database 
contains  references  to  over  10  million  individual  locations,  with  latitude  and  longitude 
coordinates,  in  addition  to  over  2  million  alternate  names  and  spellings,  spanning  over  a  100 
languages.  Converting  this  information  into  a  searchable  form  requires  first  ‘tokenizing’  the 
individual  strings  into  meaningful  chunks  (i.e.  words  and  phrases).  This  process  is  relatively 
straightforward  for  English,  as  it  makes  consistent  use  of  spaces  to  differentiate  words. 

However,  this  pattern  is  far  from  universal  in  other  languages.  For  instance,  ideographic 
languages  such  as  Chinese  and  Japanese  generally  use  long  strings  of  characters  with  no  spaces 
in  between  words,  while  Vietnamese  uses  spaces  in  between  each  syllable  of  a  single  word. 
Moreover,  sometimes  atomic  concepts,  such  as  “China”,  are  represented  by  ‘words’  composed  of 

3  See  http://www.aerospike.com/.  The  database  application  also  makes  use  of  a  modified  version  of  the  UltraJSON 
python  library  (https://github.com/esnme/ultraisont.  which  1  have  expanded  to  allow  for  bulk  parsing  of  large, 
multiline  text  files,  and  a  modified  version  of  the  RE2  python  library  (https://github.eom/facebook/pyre2/t, 
expanded  to  allow  for  grouped  regular  expression  pattern  matching  using  hierarchically  nested  terms.  All  modified 
source  code  will  be  redistributed  on  an  open-source  basis.  Contact  author  for  details. 
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one,  two,  three,  or  more  ideographic  characters.  In  Cambodian,  a  number  of  common  place 
names  require  as  many  as  eight  ideographic  word -characters  to  write  the  string  referring  to  a 
single  city.  Thus,  the  very  notion  of  what  counts  as  a  “word”  or  “phrase”  is  difficult  to 
generalize  across  languages.  The  solution  generally  adopted  in  the  works  cited  above,  has  been 
to  either  ignore  the  problem  by  focusing  on  English  place  names,  or  to  develop  language-specific 
parsers  for  particular  applications.  But  this  requires  expensive  computations,  as  each  parser  must 
actually  read  and  make  sense  of  the  string  in  order  to  detennine  the  proper  word/phrase 
boundaries,  and  so  cannot  be  feasibly  implemented  for  search  across  a  large  number  of 
languages  simultaneously. 

Instead,  I  construct  a  generic  multilingual  phrase  index  by  segmenting  each  text  string 
arbitrarily,  without  expending  any  effort  to  ‘read’  or  make  semantic  sense  of  the  underlying  text. 
To  do  so,  I  make  use  of  a  particular  text  encoding  format,  known  at  “UTF-8”,  which  has  the 
advantage  of  coding  all  characters  in  fixed-size  arrays  of  bytes.  A  roman  letter,  such  as  “a”  for 
instance,  is  stored  in  a  single  byte,  whereas  nearly  all  ideographic  characters  in  common  use  are 
stored  as  either  3  bytes  or  6  bytes.  This  means  that  whereas  roman  scripts  can  be  split  into  words 
by  breaking  at  every  space,  ideographic  scrips  can  be  broken  into  potential  words  by  splitting  the 
string  in  byte  lengths  of  multiples  of  3.  Some  of  the  resulting  sub-strings  will  be  nonsense,  but 
they  can  be  easily  screened  out  by  attempting  to  re-encode  the  bytes  as  valid  UTF-8  characters, 
and  discarding  any  uninterpretable  sub-strings.  Arbitrary  phrases  are  thus  constructed  from  each 
string  by  first  splitting  at  every  space,  and  then  taking  any  remaining  non-roman  characters  and 
extracting  all  unique  substrings  with  lengths  equal  to  multiples  of  3,  and  then  concatenating  the 
resulting  words  into  space-separated  sequences  (i.e.  ‘phrases’)  consisting  of  all  unique  sub¬ 
sequences  with  length  less  than  some  maximum  phrase  length.  I  in  the  results  reported  below,  I 
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allow  for  phrase  lengths  up  to  9  ‘words’,  to  account  for  difficult  strings  such  as  “ Cong  hda  Xa  hoi 
chu  nghia  Viet  Nam ”,  which  is  the  name  of  the  country  of  “Vietnam”  written  in  Vietnamese,  and 
“jfntafjtnm”,  which  is  the  name  of  the  city  of  “Phnom  Penh”  written  in  Cambodian.  Each  of 
these  phrases  is  then  separately  indexed  in  an  in-memory  hash  table,  as  described  above.  The 
result  is  a  search  index  composed  of  approximately  23  million  unique  text  phrases. 

Input  search  strings  are  taken  from  the  “Location”  field  associated  with  each  Twitter 
message,  which  is  simply  a  box  into  which  users  can  type  free-fonn  descriptions  of  the  location 
(usually  a  hometown)  from  which  they  are  sending  their  messages.  These  input  strings  are 
tokenized  through  the  same  procedure,  allowing  one-to-one  matching  of  exact  phrases.  When 
multiple  matching  strings  are  found,  the  algorithm  narrows  the  potential  matches  by  first 
checking  for  nested  overlaps  between  administrative  units,  such  as  “Ohio”,  and  specific  places, 
such  as  “Springfield”,  and  then  prioritizes  matches  to  more  specific  places  over  matches  to  more 
general  areas.  To  break  further  ties,  the  algorithm  then  relies  on  a  simple  measure  of  the 
“salience”  of  the  information  in  the  search  result,  by  assigning  a  score  to  each  potential  match, 
given  by: 


where  P  is  the  total  population  of  the  place,  as  recorded  in  the  Geonames  database,  and  L  is  the 

byte-length  of  the  matching  character  string. 

For  each  record,  we  first  check  whether  GPS  coordinates  are  available  (less  than  2%  of 

the  sample),  and  if  they  are  not  then  we  attempt  to  match  any  location  text  using  the  procedure 

described  above.  Records  for  which  no  matching  locations  can  be  found,  or  which  can  only  be 

matched  at  level  of  countries  or  top-level  administrative  units,  are  discarded.  The  remaining 

records  (approximately  27%  of  the  original  sample)  are  then  parsed,  assigned  latitude,  longitude, 
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and  timestamp  coordinates,  and  stored  in  a  separate  key-value  database,  in  which  the  keys  are 
given  by  unique  combinations  of  discrete  units  of  space  and  time.  In  this  way,  the  keys  of  the 
database  function  as  spatio-temporal  indexes,  allowing  for  high-speed  access  of  chunks  of 
records  defined  by  discrete  ranges  of  time  and  space.  The  chunks  are  defined  in  units  of 
latitude/longitude  degrees  and  hours,  so  that  each  storage  bin  holds  the  records  for  a  1  -degree  x 
1 -degree  x  1-hour  box  of  space-time.  The  result  is  an  in-memory  structured  representation  of 
each  record,  stored  entirely  in  RAM,  recording  the  full  text  of  each  message,  the  estimated  geo¬ 
coordinates  of  the  user's  sending  location,  and  the  date  and  time  when  the  message  was  sent. 

Using  this  approach,  I  identify  14,322,348  separate  Twitter  messages  sent  from  within 
the  boundaries  of  Nigeria,  between  August  1st,  2013  and  July  31st,  2014.  This  set  of  records 
forms  the  basis  for  the  results  reported  below.  In  order  provide  predictive  leverage  on  the 
location  and  timing  of  violent  events,  I  seek  to  side-step  the  thorny  issues  associated  with  cross- 
cultural  interpretations  of  complex  symbols,  attitudes,  and  sentiments,  and  focus  instead  on 
discursive  references  to  particular  “concepts”,  for  which  more  rigorous  bounds  can  be  defined  on 
a  cross-cultural  basis.  In  particular,  I  aim  to  capture  simple  indicators  of  three  concepts,  with 
differing  levels  of  complexity:  (1)  a  country  (“Nigeria”)  understood  a  fixed  referent  by  those 
familiar  with  the  term,  (2)  a  group  (“Hausa”)  representing  a  locus  of  recent  political  struggle,  and 
(3)  a  category  of  action  (“armed  conflict”)  which  can  be  objectively  defined  but  which  is 
described  in  practice  through  a  wide  array  of  terms. 

The  concepts  of  “Nigeria”  and  “Hausa”,  while  complex  in  a  sociological  sense,  are 
relatively  easy  to  search  for  in  text  form.  Even  across  the  major  linguistic  communities  in 
Nigeria,  these  tenns  tend  to  be  spelled  in  approximately  the  same  way.  Using  the  cross-language 
references  in  Wikipedia,  I  identify  seven  local  spelling  variants  for  “Nigeria”  ('nijeriya', 
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'najeriya',  'naijiriya',  ’naijiria’,  'naigeria',  'naijiria',  and  ’naijiriya')  and  four  local  spelling  variants 
for  “Hausa”  (‘bahaushe’,  ‘bahaushiya,  ‘hausawa’,  and  ‘haoussa’). 

The  concept  of  “armed  conflict”,  in  contrast,  represents  a  more  difficult  search  task,  as  it 
can  be  referenced  through  a  wide  variety  of  specific  objects  and  actions  (e.g.  ‘stabbing’, 
‘airstrike’,  ‘soldier’,  etc.),  all  of  which  need  to  be  jointly  recognized  as  members  of  the 
overarching  concept.  To  accomplish  this  on  a  cross-linguistic  basis,  I  first  cross  reference 
existing  lexicons  (Harvard  Inquirer,  MPQA)  to  develop  a  list  of  366  English  language  terms 
representing  direct  references  to  objects  and  actions  associated  with  armed  conflict  (see 
Appendix),  taking  care  to  include  all  forms  of  relevant  nouns  and  verbs.  I  then  use  scripted 
access  to  the  Google  Translate  API  (https://translate.google.com/)  to  attempt  to  translate  each 
term  into  the  five  most  common  non-English  languages  in  Nigeria:  French,  Arabic,  Hausa,  Igo, 
and  Yorbua.  The  results  of  this  machine  translation  exercise  are  shown  in  Table  Al,  with  blank 
cells  indicating  either  that  no  translation  was  possible  or  that  the  original  term  was  selected  as  the 
best  translation.  As  can  clearly  be  seen,  the  French  and  Arabic  translations  achieve  more 
thorough  coverage  than  the  smaller  Nigerian  languages,  but  there  is  good  general  coverage 
across  all  concepts  and  languages.  Collapsing  this  table  into  a  searchable  index  yields  1,195 
unique  search  strings,  which  are  stored  and  indexed  in  a  separate  database  using  the  tokenization 
procedures  described  above. 

For  each  concept  and  each  day,  I  estimate  a  continuous  spatial  surface,  representing  the 
relative  density  of  messages  referencing  that  concept  in  a  particular  place  and  time.  The 
smoothing  is  conducted  using  2-dimensional  binned  Gaussian  kernel  density  interpolation.  For 
each  concept,  for  each  day  I  estimate  a  separate  smoothed  density,  treating  as  separate  points 
each  message  containing  the  concept,  and  then  calculate  a  separate  smoothed  density  surface 
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using  the  full  sample  of  messages,  regardless  of  content.  The  final  values  reported  for  each 
concept  are  then  the  concept  density  estimated  at  a  given  location  in  space-time,  divided  by  the 
total  estimated  message  density  at  that  location.  The  result  is  a  smooth  surface  estimating  the 
likelihood  that  a  given  location  will  produce  a  token  of  a  given  concept,  relative  to  the  total 
volume  of  tokens  produced  at  that  location. 

Figure  1  shows  a  color-scale  representation  of  the  smoothed  densities  of  total  message 
volume  and  the  relative  densities  of  the  concepts  of  “Nigeria”  and  “Hausa”,  on  days  at  the 
beginning,  middle,  and  end  of  our  period  of  study,  with  red  indicating  higher  levels  and  yellow 
indicating  lower  levels.  The  green  circles  show  the  actual  locations  of  the  messages  used  to 
produce  the  smoothed  surfaces,  with  larger  bubbles  representing  a  greater  volume  of 
messages.  As  can  clearly  be  seen,  these  metrics  generate  substantial  content -based  variation 
which  is  not  simply  reflective  of  the  underlying  volume  of  messages.  Moreover,  the  geographic 
distribution  of  references  to  these  terms  varies  significantly,  with  references  to  “Hausa” 
occurring  much  more  frequently  in  the  north  of  the  country  where  Hausa  communities  represent 
a  larger  proportion  of  the  population. 

Statistical  Models  and  Results 

In  order  to  draw  inferences  regarding  the  relationship  between  these  metrics  and  the 
emergence  of  collective  violence,  I  estimate  heterogeneous  point  process  models  with  a  Strauss 
inter-point  interaction  function  designed  to  flexibly  capture  patterns  of  spatial  autocorrelation 
without  forcing  the  analyst  to  pre-specify  spatial  units  at  any  particular  resolution  (see  Warren 
(2015)  for  a  discussion).  The  dependent  variable  is  measured  using  the  ACLED  v5  database, 
from  which  I  build  a  list  of  the  locations  of  all  violent  armed  conflict  events  occurring  within 
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Nigeria,  from  September  1st,  2013  to  July  31st,  2014  (n  =  1,427).  For  each  event,  covariate 
values  are  associated  with  the  event  by  taking  the  daily  smoothed  surfaces  described  above  and 
averaging  across  a  temporal  window  stretching  back  over  the  previous  30  days.  Randomly 
generated  control  points  generated  for  statistical  inference  are  spread  evenly  within  this  space- 
time  box.  Regression  modelling  then  proceeds  by  comparing  the  covariate  distributions 
observed  at  the  random  controls  points,  to  the  covariates  observed  at  the  actual  event  locations. 

The  results  are  presented  in  Table  1.  Model  1  is  a  baseline  specification  which  includes 

only  total  message  density  and  the  interpoint  interaction  function.  Model  2  adds  in  the  covariate 

surfaces  capturing  the  relative  density  of  our  concepts,  “armed  conflict”,  “Nigeria”,  and  “Hausa.” 

Finally,  Model  3  add  an  interaction  terms  between  “Nigeria”  and  “Hausa.”  Taken  as  a  whole, 

the  results  demonstrate  that  substantial  predictive  leverage  can  be  gained  through  metrics  derived 

from  the  content  of  social  media  messages.  Comparing  Model  1  to  Model  2,  we  can  that  the  AIC 

score  improves  with  the  addition  of  our  content-based  metrics,  indicating  that  the  results  are  not 

driven  simply  by  differences  in  the  penetration  of  the  medium  in  different  areas  of  the  country. 

Rather,  it  appears  that  variation  in  the  content  of  the  messages  provides  additional  predictive 

leverage  over  the  likely  locations  of  armed  conflict  events.  In  particular,  the  positive  and 

statistically  significant  (p  <  0.001)  coefficient  for  “armed  conflict”  indicates  that  areas  where 

people  speak  with  more  violent  discourse  are  also  areas  that  are  more  likely  to  generate  actual 

events  of  violence.  Moreover,  the  negative  and  significant  coefficient  for  “Nigeria”  (p  <  0.01) 

indicates  that  areas  where  people  make  more  frequent  references  to  the  country  as  a  whole  are 

less  likely  to  generate  internal  collective  violence.  In  contrast,  the  positive  and  significant 

coefficient  for  “Hausa”  (p  <  0.001)  indicates  that  discursive  references  to  this  polarizing 

sectarian  identity  are  systematically  associated  with  higher  levels  of  actual  violence.  Finally,  the 
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positive  and  significant  results  for  the  interaction  term  between  “Nigeria”  and  “Hausa”  (p  < 
0.001)  indicates  that  the  most  violent-prone  configuration  of  these  variables  occurs  in  areas 
where  “Nigeria”  and  “Hausa”  are  referenced  with  high  joint  density. 

Conclusion 

The  results  presented  here  thus  provide  new  evidence  for  the  importance  of 
communication  dynamics  in  the  production  of  collective  violence.  Moreover,  they  demonstrate 
that  it  is  possible,  even  with  very  simple  metrics,  to  begin  to  differentiate  forms  of  collective 
discourse  that  are  more  prone  to  be  associated  with  actual  events  of  collective  violence.  In 
particular,  the  evidence  presented  here  indicates  that  discourses  revolving  around  integrative 
national  identities  are  likely  to  be  less  prone  to  the  generation  of  collective  violence  than 
discourses  that  focus  on  divisive  sectarian  identities,  while  also  pointing  to  the  possibility  that  it 
is  actually  the  confluence  of  these  categories  that  is  most  strongly  associated  with  the  production 
of  violence. 

However,  based  on  the  very  preliminary  results  presented  here,  a  number  of  questions 
remain.  While  these  associations  generate  substantial  predictive  leverage,  it  is  not  clear  whether 
they  arise  due  to  “reflective”  mechanisms,  through  which  discourse  comes  to  mirror  existing 
events  on  the  ground,  or  due  to  “constructive”  mechanisms,  through  which  discourse  produces 
events  that  would  not  otherwise  have  occurred.  Moving  forward,  closer  attention  to  the  temporal 
dynamics  underlying  these  processes  may  make  it  possible  to  begin  to  disentangle  the  direction 
of  these  causal  arrows. 
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Figure  1.  Relative  Spatio-Temporal  Density  of  Discursive  Concepts 


08-01-2013 


07-31-2014 
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Table  1.  Point  Process  Models  of  Violent  Event  Locations 


Model  1 

Model  2 

Model  3 

Total  Density 

8.1990  *** 
(1.1181) 

13.5342  *** 
(2.7577) 

25.6532  *** 
(3.2742) 

"armed  conflict" 

3.2000  *** 
(0.3618) 

3.3299  *** 
(0.3698) 

"Nigeria" 

-0.8351  ** 
(0.3105) 

-4.4354  *** 
(0.5416) 

"Hausa" 

0.1518  *** 
(0.0246) 

-1.2323  *** 
(0.1806) 

"Nigeria"  x  "Hausa" 

1.7509  *** 
(0.2226) 

Intercept 

1.6191  *** 
(0.1284) 

-0.8584  * 
(0.3770) 

2.0897  *** 
(0.5674) 

Interpoint  Interaction 

0.0027  *** 
(0.0003) 

0.0024  *** 
(0.0004) 

0.0022  *** 
(0.0004) 

AIC 

-3882.43 

-3917.65 

-3984.23 

Note:  Coefficients  from  heterogeneous  point  process  models.  Standard  error  in  parentheses. 
*p  <  0.05,  **p  <  0.01,  ***p  <  0.001 
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Appendix 


Table  Al.  Nigerian  Multilingual  Dictionary  of  “armed  conflict” 


English 

Arabic 

French 

Hausa 

Igbo 

Yoruba 

aggression 

agression 

ta’adi 

awakpo 

ifinran 

aggressions 

agressions 

ta'addancin 

aggressor 

agresseur 

tsokanar  zalunci 

ocho 

aggressors 

agresseurs 

tsokana 

ebido 

airstrike 

sc$E  '‘Jt 

raid  aerien 

harin  jirgin  sama 

airstrikes 

frappes  aeriennes 

harin  na  jiragen 

ak  47 

ak47 

ambush 

embuscade 

kwanto 

ambushed 

embuscade 

kwanton 

echechiela 

nibon 

ambushes 

embuscades 

kwanton  bauna 

neru  nbi 

ebu 

ambushing 

embuscade 

annihilate 

SjIU 

annihiler 

warware 

ekpochapu 

annihilated 

aneanti 

shafe 

n'iyi 

odi 

annihilates 

annihile 

shafe 

annihilating 

annihilant 

halakar 

kpochapu 

annihilation 

rushewa 

ebibi 

antagonism 

antagonisme 

abotar  gaba 

imegidesi 

antagonist 

antagoniste 

na-eti  okpo 

antagonists 

fjL 

antagonistes 

na-aku  okpo 

armament 

armement 

makamai 

nke  zajon 

iham 

armaments 

armements 

ngwa  agha 

armed 

arme 

agha 

ologun 

armies 

armees 

sojojin 

usuu  ndj  agha 

ogun 

arming 

armement 

tara  makamai 

igbochi  ngwa  agha  ijuputa 

armored 

blinde 

sulke 

armoured 

blinde 

sulke 

army 

armee 

sojojin 

agha 

ogun 

artillery 

artillerie 

manyan  bindigogi 

ogbunigwe 

assassinate 

Jg£l 

assassiner 

kisa 

igbu  mmadu 

assassinated 

assassine 

kashe 

egbu 

assassinates 

assassine 

assassinating 

Jq£l 

assassinant 

kisan  gilla 

assassination 

J43 

assassinat 

kisan  gilla 

mgbu  mmadu 

assassinations 

assassinats 

aikata  kashe-kashen 

ipania 

assault 

elAjSil 

agression 

hari 

wakpo 

sele  si 

assaulted 

agresse 

auka 

tiri 

nri  ipalara 

assaulting 

*Ia£VI 

assaut 

n'iwakpo 

assaults 

agressions 

hari 

ema  esjn 

attack 

attaque 

hari 

agha 

kolu 

attacked 

fE1* 

attaque 

sun  kai  hari 

wakpoo 

kolu 

attacker 

fZ)‘r 

attaquant 

ebibi 

attackers 

attaquants 

maharan 

kpara 

attacking 

Ve'v 

attaquer 

kai  hare  hare 

awakpo 

baa 

attacks 

attaques 

kai  hare-hare 

ogu 

ku 

barricade 

barikadi 

mgbochi 

barricaded 

barricade 

mechibido 

barricades 

barricading 

(J-3'  J^'4-1 

bastingages 

imechibido 

battalion 

bataillon 

bataliya 

ewu 

battalions 

C_j^S 

bataillons 

ororun 

battle 

*4 & 

bataille 

yaki 

agha 

ogun 

battled 

lutte 

fama 

agha 

battlefield 

champ  de  bataille 

fagen  fama 

n'ogbo  agha 

ogun 
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Table  A1  (cont.)  Nigerian  Multilingual  Dictionary  of  “armed  conflict” 


English 

Arabic 

French 

Hausa 

Igbo 

Yoruba 

battlefields 

jugjiiiyoi 

champs  de  bataille 

fagen 

battlefront 

battlefronts 

Ji^ldile^p 

champs  de  bataille 

battleground 

iu“ 

champ  de  bataille 

a  fafata 

agha 

battlegrounds 

champs  de  bataille 

dauki  ba  dadi 

battles 

batailles 

fadace-fadace 

agha 

ogun 

battleship 

navire  de  guerre 

jirgin  ruwa  na  soja 

agha 

battleships 

Ej'jSrXj 

cuirasses 

battlespace 

s4jq.ji 

bataille 

battlespaces 

espaces  de  combat 

battling 

combattre 

alu 

njijadu 

behead 

•u*j 

decapiter 

beheaded 

l4j  fc-Ks 

decapite 

fille  kansa 

isi 

be 

beheading 

u'b  fc-ki 

decapitation 

fille 

beheadings 

o*j  jjJt-Ks 

decapitations 

belligerent 

belligerant 

mmuQ  ilu  ogu 

belligerents 

ujy't'f’J' 

belligerants 

bled 

‘-•ju 

saigne 

zub  da  jini 

leemoo 

bleed 

saigner 

jinni 

igba  obara 

bleeding 

l-iLpU 

saignement 

na  jini 

obara  ogbugba 

eje 

bleeds 

‘-‘jtig 

saigne 

blockade 

Jl>z 

blocus 

kawancen 

mgbpchi 

blockaded 

bloque 

npchibidpro  anpchibidp 

blockades 

JotJI 

blocus 

blockading 

Jot 

blocus 

blood 

sang 

jini 

pbara 

eje 

bloodshed 

effusion  de  sang 

zubar  da  jini 

na-awufu  pbara 

bloodstain 

tache  de  sang 

bloodstained 

tache  de  sang 

pbara  tetpro 

bloodstains 

taches  de  sang 

pbara 

bloody 

fb 

sanglant 

na  jini 

pbara 

itajesile 

bomb 

bombe 

bam 

bpmbu 

bombu 

bombed 

bombarde 

bamai 

turn  bpmbu 

bomber 

bombardier 

ptu  bpmbu 

bombers 

opcy i)l 

bombardiers 

kai  harin 

atu  bpmbu 

bombing 

bombardement 

bom 

bpmbu 

bombu 

bombings 

attentats  a  la  bombe 

bom 

bombs 

bombes 

ragargaza 

ado- 

brigade 

c-ljJ 

birged 

brigeedi 

egbe  omo  ogun 

brigades 

bullet 

balle 

harsashi 

ibon 

bullets 

balles 

harsasai 

mgbp 

awako 

casualties 

victimes 

jikkata 

pnwu 

faragbogbe 

casualty 

victime 

mai  hasara 

a  na-egbu 

combat 

J'4=j 

fama 

ogu 

ija 

combatant 

combattant 

n'Nu  Agha 

combatants 

U 

combattants 

na-alu  agha 

ogun 

conflict 

conflit 

rikici 

esemokwu 

rogbodiyan 

conflicts 

conflits 

rikice-rikice 

esemokwu 

ija 

confrontation 

affrontement 

adawa 

confrontations 

ese  okwu 

coup 

juyin  mulki 

kuu 

coups 

juyin  mulki  ne 

damage 

dommage 

mmebi 

bibaje 

damaged 

endommage 

lalace 

mebiri  emebi 

ti  baje 

damaging 

'jjo^ 

dommageable 

tareda  zata 

emebiri 

omode 

dead 

mort 

matattu 

nwuru  anwu 

oku 
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Table  A1  (cont.)  Nigerian  Multilingual  Dictionary  of  “armed  conflict” 


English 

Arabic 

French 

Hausa 

Igbo 

Yoruba 

deadly 

mortel 

na-egbu  egbu 

oloro 

death 

deces 

mutuwa 

onwu 

iku 

deaths 

deces 

mutuwar 

onwu 

iku 

decapitate 

decapiter 

decapitated 

decapite 

decapitates 

l>“L> 

decapite 

decapitating 

u"j 

decapitant 

decapitation 

decapitation 

destroy 

r* 

detruire 

halaka 

ebibi 

destroyed 

cjjfj 

detruit 

halakar 

ebibi 

destroyer 

JHf 

destructeur 

hallakarwa 

mbibi 

apanirun 

destroyers 

hallaka 

ebukoro 

awon  afiniseije 

destroying 

detruisant 

hallaka 

ebibi 

dabaru 

destroys 

detruit 

halaka 

ebibie 

destruction 

halaka 

mbibi 

iparun 

die 

‘-‘JfiS 

mourir 

mutu 

anwu 

ku 

died 

mort 

ya  rasu 

nwuru 

ku 

dies 

meurt 

mutu 

na-anwu  anwu 

ku 

dismember 

demembrer 

dismembered 

demembre 

emekwa 

dismembering 

equarrissage 

dismembers 

demembre 

dying 

mourant 

mutuwa 

na-anwu  anwu 

ku 

enemies 

ennemis 

makiyan 

ire 

ota 

enemy 

ennemi 

makiyi 

enye  ire 

ota 

explosion 

fashewa 

gbawaranu 

bugbamu 

explosive 

explosif 

mgbawa 

ibejadi 

explosives 

explosifs 

nakiyoyi 

fatal 

US 

egbu  egbu 

apani 

fatalities 

deces 

anwu 

fatality 

fatalite 

pdachi 

fatally 

mortellement 

gbagburu 

feud 

querelle 

gaba 

esemeokwu 

orilede 

feuded 

rlj%' 

rivalisait 

feuding 

vendetta 

husuma 

mu  awoon 

feuds 

querelles 

fight 

bats  toi 

yaki 

agha 

ija 

fighter 

JJi# 

combattant 

jirgin  saman  soja 

onija 

fighters 

JJj# 

combattants 

mayakan 

aluse 

awon  onija 

fighting 

j*a)i 

combat 

fada 

Pgu 

ija 

fights 

4ji&Ji 

combats 

ta  fada 

Hu  pgu 

nja 

firearm 

arme  a  feu 

ohun  ija 

firearms 

armes  a  feu 

bindigogi 

eji  egbe  agbagbu 

ibon 

firefight 

fusillade 

firefights 

des  echanges  de  tirs 

force 

SjJ 

karfi 

ike 

agbara 

forces 

-Ijjl 

sojojin 

agha 

ologun 

fought 

Jdfi5 

combattu 

suka  yi  jihadi 

agha 

ja 

grave 

JH3 

tombe 

kabari 

ili 

sin 

graves 

tombes 

kaburbura 

ili 

iboji 

grenade 

gurnati 

bombu 

grenades 

J4tSp 

gurnetin 

guerillas 

^X 

guerilleros 

dakarun 

agha  ekpuru 

guerrilla 

guerilla 

yakin 

ekpuru 

gun 

pistolet 

bindiga 

egbe 

ibon 

gunboat 

lssx  i3jj  j 

canonniere 

gunboats 

SLsytJtsjljJJ' 

canonnieres 
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Table  A1  (cont.)  Nigerian  Multilingual  Dictionary  of  “armed  conflict” 


English 

Arabic 

French 

Hausa 

Igbo 

Yoruba 

gunfire 

JapUbj 

des  coups  de  feu 

bindigar 

gunman 

oV 

tireur 

gunmen 

Ojcdf 

des  hommes  armes 

yan  bindiga 

gunned 

abattu 

gunner 

canonnier 

sojan  igwa 

onye  agha 

gunners 

canonniers 

gunning 

gunpowder 

poudre  a  canon 

guns 

ip'dfeil 

pistolets 

bindigogi 

egbe 

ibon 

gunship 

sgyt 

gunships 

helicopteres  de  combat 

gunshot 

jiupu*! 

coup  de  feu 

harbin  bindiga 

ibon 

gunshots 

des  coups  de  feu 

bindigogi 

uda  egbe 

handgun 

pistolet 

handguns 

J' 

armes  de  poing 

egbe  mkpumkpu 

hostiles 

hostilities 

hostilites 

tashin 

igboro 

hostility 

hostility 

rashin  jituwa 

iro 

igbogunti 

infantry 

s'ufcS' 

infanterie 

dakaru 

bipu 

elese 

ied 

bama-bamai 

ieds 

bamai 

injure 

C->E 

blesser 

cuta 

emeru 

ipalara 

injured 

£J£ 

blesse 

ji  rauni 

meruru  ahu 

farapa 

injures 

Cjld^>A 

blesse 

emeru 

injuries 

blessures 

raunin  da  ya  faru 

unan 

nosi 

injuring 

blessant 

jikkata 

memo 

injury 

JJCK" 

blessure 

rauni 

mmeru 

ipalara 

insurgencies 

insurrections 

hare  haren 

insurgency 

insurrection 

tayar  da  kayar  baya 

insurgent 

insurge 

hare 

insurgents 

insurges 

maharan 

invade 

envahir 

mamaye 

wakporo 

gbogun 

invaded 

t 

envahi 

mamaye 

wakporo 

yabo 

invader 

j'i 

envahisseur 

mai  mamaye 

onye  mbusoagha 

invaders 

SljgJI 

envahisseurs 

mwakpo 

invades 

jj&f 

envahit 

ta  mamaye 

awakpoo 

invading 

•LgjlgJI 

envahisseur 

na-awakwasj 

invasion 

jji 

mamayewa 

mbuso  agha 

ayabo 

invasions 

mamayar 

mwakpo 

kill 

<_k3 

tuer 

kashe 

igbu 

pa 

killed 

<_k3 

tue 

kashe 

gburu 

pa 

killer 

tueur 

kisa 

egbu  egbu 

apani 

killers 

sJtJil 

tueurs 

kisan 

aporo 

killing 

<_k3 

meurtre 

kashe 

okowot 

pipa 

killings 

J3JI 

tueries 

kashe-kashe 

kills 

Jmiss 

tue 

kashe 

egbu 

pa 

land  mine 

LSU-^J  <*jJ 

mine  terrestre 

ala  m 

ile  mi 

land  mines 

Si^jVI  flg&l 

les  mines  terrestres 

kasar  mahakai 

ogbunigwe 

ile  maini 

landmine 

s^jVI  fl jJVl 

les  mines  terrestres 

landmines 

SyyijV!  fl jJVl 

les  mines  terrestres 

nakiyoyin  da 

lese 

machinegun 

mitraillette 

machineguns 

mitrailleuses 

maim 

£J£ 

mutiler 

maimu 

maimed 

estropie 

nkwaru 

abuku 

maiming 

mutilation 

sofo  ti 

maims 

mutile 
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Table  A1  (cont.)  Nigerian  Multilingual  Dictionary  of  “armed  conflict” 


English 

Arabic 

French 

Hausa 

Igbo 

Yoruba 

marines 

juplfjl 

sojin  rundunar  jiragen  ruwa 

marini 

massacre 

kisan  kiyashin 

mgbuchapu 

ipakupa 

massacred 

massacres 

karkashe 

massacres 

jj'CpcJ' 

kisan  kiyashi 

massacring 

massacrant 

militarize 

milita  riser 

sojoji 

militarized 

'o 

militarisee 

yan  bindiga  a 

agha 

militarizing 

°jA)t 

militarisation 

military 

lSj4j£ 

militaire 

soja 

agha 

ologun 

missile 

makami  mai  linzami 

agha 

misaili 

missiles 

t^JO* 

jifa 

aku  uta 

mortar 

uj'° 

mortier 

turmi 

ngwa  agha 

amp 

mortars 

tjjlo 

mortiers 

murder 

assassiner 

kisankai 

igbu  ochu 

iku 

murdered 

<_k3 

assassine 

kashe 

gburu 

paniyan 

murderer 

Jdy 

assassin 

kisan  kai 

na-egbu  ochu 

apaniyan 

murderers 

SJtsJI 

meurtriers 

kisankai 

na-egbu  ochu 

a  paniyan 

murdering 

Jlg£l 

meurtre 

kashe 

-egbu  ochu, 

murderous 

meurtrier 

suka  kai 

igbu  ochu 

ipaniyan 

murderously 

murders 

meurtres 

kisan  kai 

ikwa 

mutilate 

jiXai 

mutiler 

daddatsa  gawa 

ebepu 

mutilated 

°0  jL>f  J' 

mutile 

mutilates 

Sjfif 

mutile 

mutilating 

-SsJO- 

mutilant 

jikin 

naval 

sojan  ruwa 

to  oko 

navies 

marines 

navy 

sojojin  ruwa 

agha  mmiri 

ogagun 

ordinance 

fJLWf 

ordonnance 

farilla 

ukpuru 

ilana 

ordinances 

ordonnances 

hukuncen 

idajo 

pistol 

pistolet 

bindiga 

egbe 

ibon 

pistols 

pistolets 

obere  egbe 

platoon 

SJ  J-f 

mutanena  su  ka 

platoons 

pelotons 

raid 

‘J  i 

hari 

wakporo 

igbogun  ti 

raided 

perquisitionne 

kai  hari 

wabara 

raiding 

u&yi 

raids 

hari 

egbe  ogun 

raids 

^ijyi 

hare-hare 

rape 

sAt-fel 

viol 

fyade 

n'ike 

ifipabanilopo 

raped 

viole 

fyade 

n'ike 

lopo  ti 

rapes 

Ju-feVI 

viols 

ifipabanilopo 

raping 

ulu^l 

viol 

fyade 

rapist 

violeur 

yarsu  fyaden 

rapists 

LjiSU-ofrJl 

violeurs 

afipabanilo 

rebel 

rebelle 

yan  tawayen 

enupu  isi 

sote 

rebelled 

rebelles 

tawaye 

nupuru  isi 

sote 

rebelling 

rebeller 

tawaye 

enupu  isi 

ti  sote 

rebellion 

JJr° 

rebellion 

tawayen 

nnupuisi 

isote 

rebellions 

rebellions 

tawayen 

nupu  isi 

rebellious 

rebelle 

enupu  isi 

plote 

rebels 

jljCjJ1 

rebelles 

yan  tawayen 

nnupuisi 

olote 

revolt 

SjjJj 

revolte 

yi  tawaye 

nnupuisi 

spte 

revolts 

revoltes 

yin  tawaye 

nnupuisi 

revolver 

o“Jur 

egbe 

revolvers 

o-olcPof  c]' 

rifle 

e<  fjA-i 

fusil 

bindiga 

egbe 

ibon 
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Table  A1  (cont.)  Nigerian  Multilingual  Dictionary  of  “armed  conflict” 


English 

Arabic 

French 

Hausa 

Igbo 

Yoruba 

rifleman 

fusilier 

riflemen 

•VjJ 

tirailleurs 

rifles 

fusils 

bindigogi 

awon  iru  ibon 

riot 

emeute 

ntjme 

isote  na 

rioted 

se  sont  revoltes 

rioter 

emeutier 

rioters 

emeutiers 

masu  zanga-zangar 

rioting 

emeutes 

riots 

emeutes 

tarzoma 

rocket 

£jJo-= 

fusee 

roka 

roketi 

rocketfire 

QStaxWjirf) 

rocketlauncher 

lance-roquettes 

rocketlaunchers 

rockets 

roquettes 

roka 

tammy 

security 

securite 

tsaro 

nche 

aabo 

shelled 

jJef 

decortiquees 

shelling 

^0^3 

bombardement 

wuta  ya  janyo 

shotgun 

fusil  de  chasse 

ibon 

shotguns 

c3J  >&)! 

fusils  de  chasse 

slaughter 

abattage 

kashe 

akwu 

slaughtered 

abattus 

yanka 

gbuo 

pa 

slaughtering 

C* 

abattage 

yanka 

ogbugbu 

eran 

slaughters 

-O'er 

tueries 

yanka 

small  arms 

petites  armes 

kananan  makamai 

obere  ogwe  aka 

kekere  apa 

sniper 

ual.:;q 

tireur  isole 

maharbi 

snipers 

makasa 

orukoo 

soldier 

soldat 

soja 

agha 

jagunjagun 

soldiers 

soldats 

sojoji 

agha 

ogun 

stabbed 

poignarde 

sukan 

adu 

leyiti 

stabbing 

elancement 

caka 

jma 

nibi 

strike 

greve 

yajin 

iku 

idasesile 

strikes 

greves 

buga 

etiwapu 

dasofo 

striking 

frappant 

daukan  hankali 

putara  ihe 

idase 

struck 

S-iJCK1 

frappe 

bugi 

gburu 

lu 

suicidal 

suicidaire 

igbu  onwe 

suicide 

j'c^ 

kashe  kansa 

igbu  onwe 

ara 

terror 

jVI 

terreur 

tsoro 

oke  ujo 

eruolorun 

terrorise 

i— jle  jl 

terroriser 

ta'ada 

menyeujo 

terrorised 

terrorise 

terrorises 

±*JLS 

terrorise 

terrorising 

1— jIo  jl 

terrorisant 

tayar  da  hankalin 

terrorism 

i— ilejl 

terrorisme 

ta'addanci 

iyi  oha  egwu 

ipanilaya 

terrorist 

(/4s  j) 

terroriste 

'yan  ta'adda 

eyi  oha  egwu 

apanilaya 

terrorists 

ij^A  jVI 

terroristes 

'yan  ta'adda 

eyi  oha  egwu 

onijagidijagan 

terrorize 

t—ilejl 

terroriser 

ta'ada 

menyeujo 

terrorized 

terrorise 

terrorizes 

terrorise 

barazana, 

terrorizing 

i— jIo  jl 

terrorisant 

tayar  da  hankalin 

threat 

menace 

barazana 

iyi  egwu 

irokeke 

threaten 

menacer 

barazana 

ize 

deruba 

threatened 

eJJo 

menaces 

barazana 

egwu 

ewu 

threatening 

menapant 

barazana 

na-eyi  egwu 

ihal 

threateningly 

menapant 

egwu 

threatens 

A5#(£ 

menace 

barazana 

egwu 

irokeke 

threats 

dll  t^jl 

menaces 

barazana 

egwu 

irokeke 
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Table  A1  (cont.)  Nigerian  Multilingual  Dictionary  of  “armed  conflict” 


English 

Arabic 

French 

Hausa 

Igbo 

Yoruba 

troop 

troupe 

kungiya 

troops 

troupes 

dakarun 

agha 

enia 

victim 

victime 

wanda  aka  azabtar 

aja 

njiya 

victims 

victimes 

wadanda  ke  fama 

metutara 

olufaragba 

violence 

tashin  hankali 

ime  ihe  ike 

iwa-ipa 

violent 

erne  ihe  ike 

iwa 

violently 

violemment 

ike 

war 

^X 

guerre 

yaki 

agha 

ogun 

warfare 

^X 

guerre 

yaki 

agha 

yce 

warfighter 

combattant 

warfighters 

•JQ  cjl 

combattants 

warmonger 

belliciste 

warmongers 

bellicistes 

warplane 

s 

avion  de  combat 

warplanes 

»rsac  01  jty* 

avions  de  combat 

jirage 

warred 

Mj'e 

guerroye 

yaki 

agha 

n  gbogun  ti 

warring 

JJi# 

en  guerre 

yake 

ebu  agha 

warrior 

^j'cr 

guerrier 

jagunjagun 

warriors 

usy'tfJ 

guerriers 

dike 

wars 

vjjcJi 

guerres 

yake-yake 

agha 

ogun 

warship 

\ sdc 

navire  de  guerre 

warships 

°c£X  Cni* 

navires  de  guerre 

wartorn 

weapon 

arme 

makami 

ngwa  agha 

multani 

weaponry 

armes 

makamai 

ngwa  agha 

weapons 

armes 

makamai 

ngwa  agha 

ohun  ija 

wound 

CX 

blessure 

rauni 

onya 

egbo 

wounded 

&X 

blesses 

rauni 

meruru 

ti  o  gbogbe 

wounding 

CX 

blessant 

ji  masa  rauni 

wounds 

CJX^1 

plaies 

raunuka 

onya 

ogbe 
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